Speech synthesis and prosody modification using segmentation and modelling of the excitation signal

نویسندگان

  • Juana M. Gutiérrez-Arriola
  • Francisco M. Gimenez de los Galanes
  • Mohammad Hasan Savoji
  • José Manuel Pardo
چکیده

In previous work we have presented a new method for improving the quality of LPC synthetic speech, where the excitation signal was modelled with a polynomial function followed by an adaptive filter. This scheme provides the properties of mathematical models which permits avoiding the problems related to prosody control [1], [2]. In order to reduce the storage needs, a segmentation technique was developed which grouped together several pitch periods based on spectral similarity. For every segment the same coefficient set (both the polynomial function and the post-processing filter) was used. These techniques were applied to a codification/decodification task were the resulting speech quality was promising [1], [2]. In this paper we present some results concerning prosodic modification, i.e. duration and fundamental frequency arbitrary changes which show the suitability of these methods for text-to-speech applications. We also present some results of the extension of the model to unvoiced segments of speech. 1. SOURCE MODEL DESCRIPTION The original signal is first pitch marked with an algorithm very similar to that defined in [3]. Then, it is pitch synchronously analysed using the Durbin algorithm to calculate the prediction coefficients. The analysis window used is a two-period long Hamming window centred on every pitch mark. The original signal is filtered using these coefficients to obtain the LP excitation signal. This excitation signal is modelled for voiced parts using a parametric (polynomial) version of the original excitation signal. In this first version of the system the unvoiced segments were synthesised using a stochastic function as excitation. 1.1. Polynomial interpolation. We use a 6th-order polynomial waveform model to represent the derivative of the glottal volume velocity waveform [4]. This derivative function is computed by direct integration of the residual and high pass filtering to zero-centre the resulting signal. The polynomial function is obtained by curve fitting in a least square sense where a fine-tuning or readjustment is needed to exactly synchronise the pitch marks with the most negative sample. Figure 1. Polynomial approximation of the residual integral (6th order). 1.2. Equalisation of the synthesised waveform to the original speech by adaptive filtering An optimum Wiener filter (FIR) is calculated and used on the synthesised speech to improve the final quality at the output. This is equivalent to adaptive filtering because the optimum filter changes with each segment. The SPEECH SYNTHESIS AND PROSODY MODIFICATION USING SEGMENTATION AND MODELING OF THE EXCITATION SIGNAL J.M. Gutiérrez Arriola, F.M. Giménez de los Galanes, M.H. Savoji, J.M. Pardo Grupo de Tecnología del Habla, Departamento de Ingeniería Electrónica, E.T.S.I. Telecomunicación, Universidad Politécnica de Madrid Ciudad Universitaria. 28040, Madrid. Spain order of the filter is fixed (between 30 and 50), and determined heuristically. The LPC filtering and FIR filtering steps can be interchanged so the effect is that the FIR filter is modelling the stochastic part of the excitation, completing the polynomial source model. This last configuration is preferred because the LPC filter can smooth out the discontinuities originated by sudden changes in the equalisation coefficients. Some alternatives to this method for modelling of the stochastic component have been proposed, e.g. [5], though they don't provide the flexibility needed for prosody modification. In the proposed method the model is unique for stable segments of speech, independently of the fundamental frequency. 1.3. Segmentation of the excitation signal The original signal was segmented using a simple normalised measure of spectral change in the original waveform given by:

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prosody Modification Using Allpass Residual of Speech Signals

In this paper, we attempt to signify the role of phase spectrum of speech signals in acquiring an accurate estimate of excitation source for prosody modification. The phase spectrum is parametrically modeled as the response of an allpass (AP) filter, and the filter coefficients are estimated by considering the linear prediction (LP) residual as the output of the AP filter. The resultant residua...

متن کامل

Study on Unit-Selection and Statistical Parametric Speech Synthesis Techniques

One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...

متن کامل

Prosody Modelling for Syllable-based Speech Synthesis

Prosody model used in the syllable based speech synthesizer DEMOSTHENES is described in the paper. The paper focuses on the segmental structure, especially on the segmentation into rhythm units (prosodic phrases). Relations between prosodic segments and sentence constituents are also discussed.

متن کامل

MeLos: Analysis and Modelling of Speech Prosody and Speaking Style

This thesis addresses the issue of modelling speech prosody for speech synthesis, and presents MeLos: a complete system for the analysis and modelling of speech prosody “the music of speech”. Research into the analysis and modelling of speech prosody has increased dramatically in recent decades, and speech prosody has emerged as a crucial concern for speech synthesis. The issue of speech prosod...

متن کامل

Real Time Prosody Modification

Real time prosody modification involves changing the prosody parameters such as pitch, duration and intensity of speech in real time without affecting the intelligibility and naturalness. In this paper prosody modification is performed using instants of significant excitation (ISE) of the vocal tract system during production of speech. In the conventional prosody modification system the ISE are...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997